ML-Density Estimation

Nonparametric Density Estimation

For a random vector x, assuming that it obeys an unknown distribution p(x), the probability of falling into a small area R in the space is

P = \int_{R} p (x) d x

Given $N$ training samples, Number of samples $K$ falling into the region $R$ follows a binomial distribution

P_{K} = (\binom{N}{K}) P^{K} (1 - P)^{N - K}

Approximation when $N$ is very large:

When n is very large, we can approximately think that

P \approx \frac{K}{N}

Assuming $R$ is small and $p (x)$ is approximately constant within $R$ :

P \approx p (x) V

Final approximation for $p (x)$ :

p (x) \approx \frac{K}{N V}

To accurately estimate $p (x)$ , it is necessary to make $N$ large enough and $V$ as small as possible. However, the number of samples is generally limited, and too small a region will lead to fewer samples falling into the region, so the estimated probability density is not accurate.

Fixed area size, counting the number falling into different areas, which includes histogram method and kernel method.

the area size so that the number of samples falling into each area is zero is called K-nearest neighbor method.

Histograms as density models

For low dimensional data we can use a histogram as a density model.

Histograms

How wide should the bins be? (width=regulariser)
Do we want the same bin-width everywhere?
Do we believe the density is zero for empty bins?

Kernel Density Estimation (KDE)

1. Definition

Kernel Density Estimation (KDE) is a non-parametric method to estimate the probability density function (PDF) of a random variable.

2. KDE Formula

p (x) = \frac{1}{N} \sum_{n = 1}^{N} \frac{1}{\sqrt{2 π} H} \exp (- \frac{(x - x_{n})^{2}}{2 H^{2}})

$p (x)$ : Estimated density at point $x$ .
$N$ : Total number of data points.
$H$ : Bandwidth, controlling the smoothness of the density.
$x_{n}$ : Data points.
The kernel function $ϕ$ is typically Gaussian: $ϕ (\frac{x - x_{n}}{H}) = \frac{1}{\sqrt{2 π} H} \exp (- \frac{(x - x_{n})^{2}}{2 H^{2}})$

3. Steps to Compute KDE

For each data point $x_{n}$ , calculate the distance from the target point $x$ .
Apply the kernel function to determine the weight of each data point.
Sum the contributions from all data points and normalize by $N$ .

4. Example

Data

We have 5 data points:

x = {1.0, 1.5, 2.0, 3.0, 3.5}

We want to estimate the density at $z = 2.5$ , using:

Bandwidth $H = 0.5$ ,
Gaussian kernel.

Calculation

For each $x_{n}$ , calculate:

ϕ (\frac{z - x_{n}}{H}) = \frac{1}{\sqrt{2 π} H} \exp (- \frac{(z - x_{n})^{2}}{2 H^{2}})

For $x_{1} = 1.0$ :
$ϕ (\frac{2.5 - 1.0}{0.5}) = \frac{1}{\sqrt{2 π} \cdot 0.5} \exp (- \frac{(2.5 - 1.0)^{2}}{2 \cdot {0.5}^{2}}) \approx 0.008$
For $x_{2} = 1.5$ :
$ϕ (\frac{2.5 - 1.5}{0.5}) = \frac{1}{\sqrt{2 π} \cdot 0.5} \exp (- \frac{(2.5 - 1.5)^{2}}{2 \cdot {0.5}^{2}}) \approx 0.107$
For $x_{3} = 2.0$ :
$ϕ (\frac{2.5 - 2.0}{0.5}) = \frac{1}{\sqrt{2 π} \cdot 0.5} \exp (- \frac{(2.5 - 2.0)^{2}}{2 \cdot {0.5}^{2}}) \approx 0.483$
For $x_{4} = 3.0$ :
$ϕ (\frac{2.5 - 3.0}{0.5}) = \frac{1}{\sqrt{2 π} \cdot 0.5} \exp (- \frac{(2.5 - 3.0)^{2}}{2 \cdot {0.5}^{2}}) \approx 0.483$
For $x_{5} = 3.5$ :
$ϕ (\frac{2.5 - 3.5}{0.5}) = \frac{1}{\sqrt{2 π} \cdot 0.5} \exp (- \frac{(2.5 - 3.5)^{2}}{2 \cdot {0.5}^{2}}) \approx 0.107$

Combine Contributions

The total density at $z = 2.5$ is:

p (z) = \frac{1}{N} \sum_{n = 1}^{N} ϕ (\frac{z - x_{n}}{H})

Substitute values:

p (z) = \frac{1}{5} (0.008 + 0.107 + 0.483 + 0.483 + 0.107)

p (z) = \frac{1}{5} \cdot 1.188 \approx 0.238

5. Advantages of KDE

Flexible: Does not assume a specific distribution of data.
Smooth: Produces a continuous estimate.

6. Challenges of KDE

Bandwidth $H$ : Choosing an appropriate $H$ is critical.
- Small $H$ : May overfit, capturing noise.
- Large $H$ : May oversmooth, losing details.
Computationally Expensive: Requires evaluating kernel functions for all data points.

Summary

In this example, the estimated density at $z = 2.5$ is $p (z) \approx 0.238$ . Kernel Density Estimation is a powerful tool for non-parametric density estimation, but requires careful parameter tuning.

Algorithm

Tutorial

assignment

Assignment

As-1

As-2

Lab-1

Lab-2

Lab-3

Lab-4

GAMES101

Assignment-1

Assignment-2

Assignment-3

Assignment-4

Lab

Lecture

Peoject

CSCN

Ploidy

ML-Density Estimation ​

Nonparametric Density Estimation ​

Histograms as density models ​

Kernel Density Estimation (KDE) ​

1. Definition ​

2. KDE Formula ​

3. Steps to Compute KDE ​

4. Example ​

Data ​

Calculation ​

Combine Contributions ​

5. Advantages of KDE ​

6. Challenges of KDE ​

Summary ​